On the Application of Locality Sensitive Hashing to Behavioral Web Analytics
نویسندگان
چکیده
In today’s constantly connected world, the dependence on web-based technologies is ubiquitous, creating opportunities for both malicious and benign activity. As a result, it is essential that we be able to identify users on the web. Although simple methods, such as tracking a user by userid or by IP address exist, these methods can easily be evaded if the user so desires, by creating multiple ids or operating from different IP addresses. However, due to habit, a user does not often change the way he or she browses the web, such as the time of day that she visits various genres of pages. In this project, we evaluate locality sensitive hashing (LSH) to uniquely identify and authenticate users based only on the day of the week, time of day, and genres of websites. In addition, we provide a novel extension of LSH, Mode Closest Hash (MCH).
منابع مشابه
A Layered Locality Sensitive Hashing based Sequence Similarity Search Algorithm for Web Sessions
In this article we propose a Layered Locality Sensitive Hashing Algorithm to perform similarity search on the web log sequence data. Locality Sensitive Hashing has been found to be an efficient technique for the approximate nearest neighbor search over a large database, as it has sub-linear dependence on the data size even for high dimension. Mining the large web log data to provide customised ...
متن کاملStreaming First Story Detection with application to Twitter
With the recent rise in popularity and size of social media, there is a growing need for systems that can extract useful information from this amount of data. We address the problem of detecting new events from a stream of Twitter posts. To make event detection feasible on web-scale corpora, we present an algorithm based on locality-sensitive hashing which is able overcome the limitations of tr...
متن کاملTrading accuracy for faster entity linking
Named entity linking (NEL) can be applied to documents such as financial reports, web pages and news articles, but state of the art disambiguation techniques are currently too slow for web-scale applications because of a high complexity with respect to the number of candidates. In this paper, we accelerate NEL by taking two successful disambiguation features (popularity and context comparabilit...
متن کاملEfficient Online Locality Sensitive Hashing via Reservoir Counting
We describe a novel mechanism called Reservoir Counting for application in online Locality Sensitive Hashing. This technique allows for significant savings in the streaming setting, allowing for maintaining a larger number of signatures, or an increased level of approximation accuracy at a similar memory footprint.
متن کاملScalable Techniques for Clustering the Web
Clustering is one of the most crucial techniques for dealing with the massive amount of information present on the web. Clustering can either be performed once offline, independent of search queries, or performed online on the results of search queries. Our offline approach aims to efficiently cluster similar pages on the web, using the technique of Locality-Sensitive Hashing (LSH), in which we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014